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LIKELIHOOD BASED INFERENCE FOR MONOTONE 
RESPONSE MODELS 1 

By Moulinath Banerjee 

University of Michigan 

The behavior of maximum likelihood estimates (MLEs) and the 
likelihood ratio statistic in a family of problems involving pointwise 
nonparametric estimation of a monotone function is studied. This 
class of problems differs radically from the usual parametric or semi- 
parametric situations in that the MLE of the monotone function at 
a point converges to the truth at rate n 1 ^ 3 (slower than the usual 
\fn rate) with a non-Gaussian limit distribution. A framework for 
likelihood based estimation of monotone functions is developed and 
limit theorems describing the behavior of the MLEs and the likeli- 
hood ratio statistic are established. In particular, the likelihood ratio 
statistic is found to be asymptotically pivotal with a limit distribu- 
tion that is no longer \ 2 but can be explicitly characterized in terms 
of a functional of Brownian motion. Applications of the main results 
are presented and potential extensions discussed. 

1. Introduction. A common problem in nonparametric statistics is the 
need to estimate a function, like a density, a distribution, a hazard or a re- 
gression function. Background knowledge about the statistical problem can 
provide information about certain aspects of the function of interest, which, 
if incorporated in the analysis, enables one to draw meaningful conclusions 
from the data. Often, this manifests itself in the nature of shape restric- 
tions (on the function). Monotonicity, in particular, is a shape restriction 
that shows up very naturally in different areas of application like reliability, 
renewal theory, epidemiology and biomedical studies. Consequently, mono- 
tone functions have been fairly well studied in the literature and several 
authors have addressed the problem of maximum likelihood estimation un- 
der monotonicity constraints. We point out some of the well-known ones. 
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One of the earliest results of this type goes back to Prakasa Rao [21], who 
derived the asymptotic distribution of the Grenander estimator (the MLE of 
a decreasing density); Brunk [4] explored the limit distribution of the MLE 
of a monotone regression function, Groeneboom and Wellner [9] studied the 
limit distribution of the MLE of the survival time distribution with current 
status data, Huang and Zhang [14] and Huang and Wellner [13] obtained the 
asymptotics for the MLE of a monotone density and a monotone hazard with 
right censored data, and Wellner and Zhang [27] deduced the large sample 
theory for a pseudo-likelihood estimator for the mean function of a counting 
process. A common feature of these monotone function problems that sets 
them apart from the spectrum of regular parametric and semiparametric 
problems is the slower rate of convergence (ra 1 / 3 ) of the maximum likelihood 
estimates of the value of the monotone function at a fixed point (recall that 
the usual rate of convergence in regular parametric/semiparametric prob- 
lems is \fn). What happens in each case is the following: If ip n is the MLE 
of the monotone function ip, then provided that ip'{z) does not vanish, 

(1.1) n^^niz) - ^{z)) ^ d C(z)Z, 

where the random variable Z is a symmetric (about 0) but non-Gaussian 
random variable and C(z) is a constant depending upon the underlying pa- 
rameters in the problem and the point of interest z. In fact, Z = argmin/j(H^(/i) + 
h 2 ), where W(h) is standard two-sided Brownian motion on the line. The 
distribution of Z was analytically characterized by Groeneboom [7] and 
more recently its distribution and functionals thereof have been computed 
by Groeneboom and Wellner [10]. 

In this paper we study a class of conditionally parametric models, of the 
covariate-response type, where the conditional distribution of the response 
given the covariate comes from a regular parametric model, with the pa- 
rameter being given by a monotone function of the covariate. We call these 
monotone response models. Here is a formal description: Let {p(x, 6):0£ 0}, 
with being an open subinterval of R, be a one-parameter family of proba- 
bility densities with respect to a dominating measure \x. Let ip be an increas- 
ing or decreasing continuous function defined on an interval I and taking 
values in 0. Consider i.i.d. data {pQ, Zj)}" =1 where 2j Pz being a 

Lebesgue density defined on I and Xi\Zi = z ~ p(x,t/j(z)). Interest focuses 
on estimating the function ip, since it captures the nature of the dependence 
between the response (X) and the covariate (Z). If the parametric family of 
densities, p(x,6), is parametrized by its mean, then tp(z) = E(X\Z = z) is 
precisely the regression function. In this paper, we study the asymptotics of 
the MLE of ip and also the likelihood ratio statistic for testing ip at a fixed 
point of interest, with a view to obtaining pointwise confidence sets for ijj 
of an assigned level of significance. Before we discuss this further, here are 
some motivating examples to illustrate the above framework. 
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(a) Consider, for example, the monotone regression model where Xi = 
tp(Zi) + £j, {(ei,Zi)}f =1 are i.i.d. random variables, e$ is independent of 
Zi, each e% has mean and variance a 2 , each Zi has a Lebesgue density 
jPx(') an d ip is a monotone function. The above model and its variants have 
been fairly well studied in the literature on isotonic regression (see, e.g., 
[4, 12, 16, 18]). Now suppose that the Ej's are Gaussian. We are then in 
the above framework: Z ~ pz(-) and X\Z = z ~ N(tj}(z), a 2 ). We want to 
estimate ?/> and test i/j(zq) = 6q for an interior point zq in the domain of ip. 

(b) Another example is the binary choice model where we have a dichoto- 
mous response variable X = 1 or and a continuous covariate Z with a 
Lebesgue density pz(') such that P(X = 1\Z) = ip{Z) is a smooth function 
of Z . In a biomedical context one could think of X as representing the in- 
dicator of a disease/infection and Z the level of exposure to a toxin, or the 
measured level of a biomarker that is predictive of the disease/infection. In 
such cases it is often natural to impose a monotonicity assumption on ip. 
A special version of this model is the case 1 interval censoring/ current sta- 
tus model that is used extensively in epidemiology and has received much 
attention among biostatisticians and statisticians (see, e.g., [3, 6, 9, 11]). 

(c) The Poisson regression model used for count data provides yet another 
example. Suppose that Z ~ Pz(') an d X\Z = z ~ Poisson^^)) where ip is 
a monotone function. Here one can think of Z as the distance of a region 
from a point source (e.g., a nuclear processing plant) and X the number of 
cases of disease incidence at distance Z. The expected number of disease 
cases at distance z from the source (ip(z)) may be expected to be monotone 
decreasing in z. Variants of this model have received considerable attention 
in epidemiological contexts [5, 17, 23]. 

A common feature of all three models described above is the fact that the 
conditional distribution of the response comes from a one parameter full rank 
exponential family [in (a), the variance a 2 needs to be held fixed]. Our last 
example below considers a curved exponential family model for the response 
and is of a fundamentally different flavor in that explicit characterizations 
of maximum likelihood estimates of tjj are not available in this model, in 
contrast to the preceding ones. 

(d) Conditional normality under a mean-variance relationship. Consider 
the scenario where Z has a Lebesgue density concentrated on an interval 
[a,b] (with < a < b) and given Z = z, X ~ p(x, ip{z)) for an increasing 
function ijj, with p(x,6) being the normal density, fi = c6~ 2m+1 and a 2 = 
d9~ 2m for some real m > 1, and 6,c,d> 0. For m = 1, this reduces to a 
normal density with a linear relationship between the mean and the standard 
deviation. Such a model could be postulated in a real-life setting based on, 
say, exploratory plots of the mean-variance relationship using observed data, 
or background knowledge. 
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Based on existing work, one would expect ■ipn(zo), the MLE of ip at a 
prefixed point zq to satisfy (1.1), with z replaced by zq. As will be seen, 
this indeed happens. This result permits the construction of (asymptotic) 
confidence intervals for i^(zq) using the quantiles of Z, which are well tabu- 
lated. The constant C(zq) however needs to be estimated and involves nui- 
sance parameters depending on the underlying model, and in particular, the 
derivative of ip at zq, estimating which is a tricky affair. Another likelihood- 
based method of constructing confidence sets for tp(zo) would involve testing 
a null hypothesis of the form Hq q : ip(zo) = 0, using the likelihood ratio test, 
for different values of 6, and then inverting the acceptance region of the 
likelihood ratio test; in other words, the confidence set for ip{zo) is formed 
by compiling all values of 8 for which the likelihood ratio statistic does not 
exceed a critical threshold. The threshold depends on < a < 1, where 1 — a 
is the level of confidence being sought, and the asymptotic distribution of 
the likelihood ratio statistic when the null hypothesis is correct. Thus, we 
are interested in studying the asymptotics of the likelihood ratio statistic 
for testing the true (null) hypothesis flo,0 o '■i } { z o) = #o- Pointwise null hy- 
potheses of this kind are very important from the perspective of estimation 
since they serve as a conduit for setting confidence limits for the value of ip, 
through inversion. 

A question that arises naturally is whether, similar to the classical para- 
metric case, we can find a universal limit distribution for the likelihood ratio 
statistic when the null hypothesis Ho,0 o holds, for the monotone response 
models introduced above. The hope that a universal limit may exist is bol- 
stered by the work of Banerjee and Wellner [3], who studied the limiting 
behavior of the likelihood ratio statistic for testing the value of the distribu- 
tion function (F) of the survival time at a fixed point in the current status 
model. They found that in the limit the likelihood ratio statistic behaves like 
D, which is a well-defined functional of W(t) +t 2 (and is described below). 
We will show that for our monotone response models, D does indeed arise 
as the universal limit law of the likelihood ratio statistic. 

We are now in a position to describe the agenda for this paper. In Sec- 
tion 2 we give regularity conditions on the monotone response models under 
which the results in this paper are developed. We state and prove the main 
theorems describing the limit distributions of the MLEs and the likelihood 
ratio statistic. Section 3 discusses applications of the main theorems and 
Section 4 contains some concluding remarks. The Appendix contains the 
proofs of some of the lemmas used to establish the main results in Section 
2. 

2. Model assumptions, characterizations of estimators and main results. 

Consider the general monotone response model introduced in the previous 
section. Let zq be an interior point of / at which one seeks to estimate tp. 
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Assume that: (a) pz is positive and continuous in a neighborhood of Zq, and 
(b) ip is increasing and continuously differentiable in a neighborhood of zq 
with ip'(z ) > 0. 

The joint density of the data vector {(Aj,Zj)}™ =1 (with respect to an 
appropriate dominating measure) can be written as 

n n 

p n (i>,{(X i ,Z i )}? =1 ) = l[p(X i ,'il;(Z i ))xl[p z (Z i ). 

i=i i=i 

The second factor on the right-hand side of the above display does not 
involve ip and hence is irrelevant as far as computation of MLEs is concerned. 
Absorbing this into the dominating measure, the likelihood function is given 
by the first factor on the right-hand side of the display above. Denote by ip n 
the unconstrained MLE of ip and by ip^ the MLE of ip under the constraint 
imposed by the null hypothesis Hq :ip(zo) = 9q. We assume: 

(A.0) With probability increasing to 1 as n — > oo, the MLEs ip n and ip^ 
exist. 

Consider the likelihood ratio statistic for testing the hypothesis Hq :iP(zq) = 
6q, where 9$ is an interior point of 0. Denoting the likelihood ratio statistic 
by 2 log A n , we have 

n n 

2 log A„ = 2 log IJ P&i , $n (Zi ) ) - 2 log JJ P&i Al {Zi ) ) ■ 

i=l i=l 

In what follows, assume that the null hypothesis Hq holds. 

Further assumptions. We now state our assumptions about the paramet- 
ric model p(x, 9). 

(A.l) The set X$ = {x :p(x, 8) > 0} does not depend on 6 and is denoted by 
X. 

(A. 2) l(x,9) = logp(x,9) is at least three times differentiable with respect 
to 9 and is strictly concave in 9 for every fixed x in X. The first, 
second and third partial derivatives of l(x,8) with respect to 9 will be 
denoted by i(x,9),l(x,9) and l"'(x,8). 

(A. 3) If T is any statistic such that Eq{\ T |) < oo, then 

d f f d 

09 J x T ( x )P( x i e ) dx = J x T ( x )-QQ-P( x > d ) dx 

and 

d 2 f f d 2 

J x T{x)p(x, 9) dx = T(x) QjpPix, Q) d x. 

Under these assumptions, I{9) = E e (i(X,9) 2 ) = —Eg(i(X, 9)). 
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(A. 4) 1(9) is finite and continuous at 9q. 

(A. 5) There exists a neighborhood J\f of 9q such that for all x, 
su PeeAf\ l "'( x ^)\ ^ B ( x ) and SVL PeeAf E e(B(X)) < oo. 

(A.6) The functions h (Oi, 2 ) = E dl (i(X,9 2 ) 2 ) and / 2 (0i,0 2 ) = E 6l (i(X,e 2 )) 
are continuous in a neighborhood of (9q,9q). Also, the function 
f 3(01,82) = Eq 1 (1(X,92) 2 ) is uniformly bounded in a neighborhood 

of (0o, 0o). 
(A.7) Set H(9,M) to be 

£ e [(|/'(X,0)| 2 + Z(A,0) 2 )(1{|/'(X,0)| > M} + 1{|/(A,0)| > A/})]. 
Then lirnM- > ooSup £ / &A /-.H'(0, M) = 0. 

We are interested in describing the asymptotic behavior of the MLEs of ip n 
and tp® in local neighborhoods of zq and that of the likelihood ratio statistic 
21ogA n . In order to do so, we first need to introduce the basic spaces and 
processes (and relevant functionals of the processes) that will figure in the 
asymptotic theory. 

First, define C to be the space of locally square integrable real-valued 
functions on R equipped with the topology of L 2 convergence on compact 
sets. Thus C comprises all functions <j) that are square integrable on every 
compact set and 4> n is said to converge to <j) if I\-k ~~ ^W) 2 dt —>0 

for every K. The space Cx C denotes the Cartesian product of two copies of 
C with the usual product topology. Also, define f?i oc (M) to be the set of all 
real-valued functions defined on M. that are bounded on every compact set, 
equipped with the topology of uniform convergence on compacta. Thus h n 
converges to h in i?i oc (R) if h n and h are bounded on every compact interval 
[-K, K] (K > 0) and sup^g^^ ^ | h n (x) — h(x) |— > for every K > 0. 

For a real-valued function / defined on R, let slogcm(/, /) denote the 
left-hand slope of the GCM (greatest convex minorant) of the restriction of 
/ to the interval /. We abbreviate slogcm(/,R) to slogcm(/). Also define: 

slogan (/) = (slogcm(/, (-00, 0]) A 0)l ( _ OOi0 ] 
+ (slogcm(/, (0,oo)) V 0)1(0,00). 

For positive constants c and d define the process X c ^(z) = cW(z) + dz 2 , 
where W(z) is standard two-sided Brownian motion starting from 0. Set 
9c,d = slogcm(A CjC ;) and g® d = slogcm°(A' Ci d). It is known that g C) d is a piece- 
wise constant increasing function, with finitely many jumps in any compact 
interval. Also like g C: d, is a piecewise constant increasing function, with 
finitely many jumps in any compact interval and differing, almost surely, 
from g c d on a finite interval containing 0. In fact, with probability 1, g® d is 
identically in some random neighborhood of 0, whereas g c ^ is almost surely 
nonzero in some random neighborhood of 0. Also, the length of the interval 
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D C) d on which g C:d and g® d differ is O p (l). For more detailed descriptions 
of the processes g c ^ and g® d , see [1, 3, 7, 26]. Thus, g\ t \ =g and g\ 1 = g° 
are the unconstrained and constrained slope processes associated with the 
canonical process Xi i(z). Finally, define D := J((g(z)) 2 — (g°(z)) 2 ) dz. 

The following theorem describes the limiting behavior of the unconstrained 
and constrained MLEs of ip, appropriately normalized. 

Theorem 2.1. Let 

X n {h) = n^iMzo + hrT 1 ' 3 ) - ip(z )) 

and 

Y n (h) = nWffijzo + hn~ 1 ^) - ^(z )). 

Let a = {I{iI:{zq))pz{zq))~ 1 /' 2 and b = (l/2)ip' (zq). Under assumptions (A.O)- 
(A.7) and (a), (b), (X n (h), Y n (h)) — >d (ga,b( h ) , 9a,b(h)) finite dimensionally 
and also in the space C x C. 

Thus X n (0) = n 1//3 (V' n (2;o) — ip(zo)) -^-d 9a,b(0)- Using Brownian scaling it 
follows that the following distributional equality holds in the space C x C: 

(2.2) (g a , b (h),g%(h)) = d (a(6/a) 1 /3 5( ( 6/a )2/3^ ))a(Va) i/3 5 o ((6/a) 2/3^ )) , 

For a proof of this proposition, see, for example, [1]. Using the fact that 
g(0) = d 2Z (see, e.g., [21]), we get 

(2.3) n^iMzo) - ij(z )) ^ d a{b/a) l ^g(U) = d (8a 2 b)V 3 Z. 

This is precisely the phenomenon described in (1.1). 

Our next theorem concerns the limit distribution of the likelihood ratio 
statistic for testing Hq. 

Theorem 2.2. Under assumptions (A.Q)-(k.l) and (a), (b), 
2 log A n — >d ^ when H$ is true. 

Remark 1. In this paper we work under the assumption that Z has 
a Lebesgue density on its support. However, Theorems 2.1 and 2.2, the 
main results of this paper, continue to hold under the assumption that the 
distribution function of Z is continuously differentiable (and hence has a 
Lebesgue density) in a neighborhood of zq with a nonvanishing derivative 
at zq. Also, subsequently we tacitly assume that MLEs always exist; this 
is not really a stronger assumption than (A.O). Since our main results deal 
with convergence in distribution, we can, without loss of generality, restrict 
ourselves to sets with probability tending to 1. In this paper, we focus on the 
case where tp is increasing. The case where tp is decreasing is incorporated 
into this framework by replacing Z by — Z and considering the (increasing) 
function ip(z) =ip( — z). 
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Characterizing ip n . In what follows, we define <fi(x, 6) = —l{x, 6), <f>(x, 9) = 
-l(x,e),4>(x,9) = -l\x,e) and cp'"(x,9) = -l"'(x,9). The log-likelihood func- 
tion for the data is given by Ya=i K^u^i^i)) = ^27=1 K^M^i^H)))' wnere 
Zu\ is the ith smallest covariate value and is the response value corre- 
sponding to it. Finding the MLE under the constraint that ip is increasing 
reduces to minimizing (j)(ui, U2, ■ ■ ■ ,u n ) = Ya=i <t > (X(i),Ui) over all u\ < U2 < 
■■■ <u n . Once we obtain the (unique) minimizer u = (u±, U2, ■ ■ ■ , u n ), the 
MLE ip n at the points {Z^}^ =1 is given by ty n {Zu\) = Ui for i = 1, 2, . . . ,n. 

For convenience, take 6 to be f for the subsequent discussion (this as- 
sumption can be easily relaxed; see, in particular, Remark 2 below). By 
our assumptions, is a (continuous) convex function defined on W 1 and 
necessary and sufficient conditions characterizing the minimizer are ob- 
tained readily, using the Kuhn-Tucker theorem. We write the constraints as 
g(u) <0, where g(u) = (g 1 (u),g 2 (u), . . . ,g n -i{u)) T and^(u) = Ui~u i+1 ,i = 
1, 2, ... ,n — 1. Then there exists an (n — l)-dimensional vector A = (Ai, A2, 
. . . , A„_i) T with Aj > for all i, such that, if u is the minimizer satisfying 
the constraints, g(u) < 0, then 

n-l 

K{ui - Ui+i) = and V 4>(u) + G T X = 0, 

i=i 

where G is the (n — 1) x n matrix of partial derivatives of g. The condi- 
tions displayed above are often referred to as Fenchel conditions. Solving 
recursively to obtain the Aj's (for i = l,2,...,n — 1), we get 

n n 
j=i+l i+l 

(2.4) 

for i = 1,2,..., (n-l) 

and E"=iVj0(«) = Ej=lkX{j),Uj) = 0. Now, let £1, £ 2 , . . . , be the 
blocks of indices on which the solution u is constant and let Wj be the com- 
mon value on block Bj. The equality J27=i A «(^i ~~ ^i+l) = forces Aj = 
whenever Ui < Ui+i- Noting that \j r ip(u) = (f)(Xr r \,u r ), this implies that on 
each Bj, J2reBj ( l ) {^-{r)i w j) = 0- Thus Wj is the unique solution to the equa- 
tion J2reB- ^(^(r)) w ) = 0- Also, if S is a head-subset of the block Bj (i.e., 
S is the ordered subset of the first few indices of the ordered set Bj), then 

it follows that J2 r <=s ^{^{r)-, w j) < 0. 

The solution u can be characterized as the vector of left derivatives of the 
greatest convex minorant (GCM) of a (random) cumulative sum (cusum) 
diagram, as will be shown below. The cusum diagram will itself be character- 
ized in terms of the solution u, giving us a self-induced characterization. Be- 
fore proceeding further, we introduce some notation. For points {(a?i, 2/i)}f = o 
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where xq = yo = and xo < x\ < ■ ■ ■ < x n , consider the left-continuous func- 
tion P(x) such that P{xi) = yi and such that P(x) is constant on (xi-±,Xi). 
We will denote the vector of slopes (left-derivatives) of the GCM of P(x) 
computed at the points (xi,x%, . . . } x n ) by slogcmj (xj , j/j)}" =0 . Define the 
function 



£( n ) = ^2[ui- Ui + \7i4>(u)d i 

i=l 
n 

= J2\- Ui ~ ~ <i>( X (i)''Ui) d i 1 )] 2d i, 
i=l 

where di = X7u4>(u) = <p(X^,Ui) > 0. The function £ is strictly convex and 
it is easy to see that u minimizes £ subject to the constraints that u\ < 
U2 < ■ ■ ■ < u n and hence, is given by the isotonic regression of the function 
g(i) =Ui — 4>{Xu\^ Ui)d^ 1 on the ordered set {1,2, ... ,n} with weight func- 
tion di. It is well known that the solution (ui, U2, ■ ■ ■ , u n ) = slogcm{^}=i dj, 
J2j=i9(j) d j}f=o- See, for example, Theorem 1.2.1 of [22]. In terms of the 
function 6 the solution can be written as 



R}?=i 

(2.5) 



slogcirJ 



3=1 ) i=0- 



Recall that t/j n (Z^) = iii; for a z that lies strictly between and Zr i+1 \, 

we set ip n {z) = ip n (Z^). The MLE ip n thus defined is a piecewise constant 
right-continuous function. 

Characterizing Let m be the number of Z^s that are less than or 
equal to zq. Finding ^ amounts to minimizing <j){u) = J27=i 4>{ x {i)^ u i) over 
all u\ < U2 < ■ ■ ■ < u m < 6*o < u m+ i < ■ ■ ■ <u n . This can be reduced to solv- 
ing two separate optimization problems. These are: (1) Minimize J2i=i <K-^"(i) ; u i) 
over u\ < v>2 < • • • < u m < 6q and (2) Minimize J27=m+i ^{ x (i)i u i) over < 

U m +1 < U m+2 <■■■ <U n . 

Consider (1) first. As in the unconstrained minimization problem one can 
write down the Kuhn-Tucker conditions characterizing the minimizer. It is 
then easy to see that the solution (u^u®, ■ ■ ■ , u^) can be obtained through 
the following recipe: Minimize YA=i ( t ) { x (i)-, u i) over u\ <U2< ■ ■ ■ < u m to 
get (ui,U2,...,Um). Then . . . ,&£J = {u\ A 9 ,u 2 A 6 , . . . ,u m A ). 

The solution vector to (2), say (u^ +1 , u^ +2 , • . . , is similarly given by 
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)0 f.O f .o 



{u m +i V6> ,£t m +2 V6» , . . . ,u„, V6» ) where {uj™ =m+1 
2<-<«„ Td=m+1 4>{ X {i), u i)- 



arg mm,, , , <„ 

An important property of the constrained solution {it®} ™ =1 is that on any 
block B of indices where it is constant and not equal to 60, the constant 
value, say ro^, is the unique solution to the equation 



(2.6) 



£0(*(O,«O = o. 



The constrained solution also has a self-induced characterization in terms 
of the slope of the greatest convex minorant of a cumulative sum diagram. 
This follows in the same way as for the unconstrained solution by using the 
Kuhn-Tucker theorem and formulating a quadratic optimization problem 
based on the Fenchel conditions arising from this theorem. We skip the 
details but give the self-consistent characterization: The constrained solution 



minimizes A(ui,U2, 



E?=ik - (A? - V i 4>(u°)di 1 )?d i subject 



to the constraints that u± < U2 < • • • < u m < 
di = X7u4>(u°). It is not difficult to see that 



< u. 



m+l 



<•■•<«„., where 



(2.7) 



slogcnJ ^^(X (j) ,u°), 



and 



m+l 



(21 



^(^(X (j) ,^)-^X a) ,nO)) 

j=i J i=0. 



slogcm <^ ^ 4>(X {j) ,u®), 

j=m+l 



vo . 



The constrained MLE ip^ is the piecewise constant right-continuous function 
satisfying ip^(Z^) =1$ for i = 1,2, . . . , n, ip^(zo) = 9q and having no jump 
points outside the set {Z(j)}" =1 U {zq}. 



Remark 2. The characterization of the estimators above does not take 
into consideration boundary constraints on tp. However, in certain models, 
the very nature of the problem imposes natural boundary constraints; for 
example, the parameter space for the parametric model may be naturally 
nonnegative [example (d) discussed above], in which case the constraint 
< u\ needs to be enforced. Similarly, there can be situations where u n is 
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constrained to lie below some natural bound. In such cases, Fenchel condi- 
tions may be derived in the usual fashion by applying the Kuhn-Tucker the- 
orem and self-induced characterizations may be derived similarly as above. 
However, as the sample size n grows, with probability increasing to 1, the 
Fenchel conditions characterizing the estimator in a neighborhood of zq will 
remain unaffected by these additional boundary constraints, since ip{zo) is 
assumed to lie in the interior of the parameter space, and the asymptotic 
distributional results will remain unaffected. 

For this paper, we will assume the (uniform) almost sure consistency 
of the MLEs ip n and Vn f° r ^ m a closed neighborhood of Zq. For the 
purpose of deducing the limit distributions of the MLEs and the likelihood 
ratio statistic, the following lemma, which guarantees local consistency at 
an appropriate rate, is crucial. 

Lemma 2.1. For any M > 0, 

max< sup \ipn(zo + hn~ 1 ' 3 ) 
lhe[-M ,Mo] 

sup \i>n( z + hn" 1 ' 3 ) 
he[-M ,Mo] 

is O p {n~ 1 ' 3 ). 

We next state a number of preparatory lemmas required in the proofs 
of Theorems 2.1 and 2.2. But before that we need to introduce further 
notation. Let P n denote the empirical measure of the data that assigns 
mass 1/n to each observation (Xi,Zi). For a monotone function A defined 
on / and taking values in 0, define the following processes: W nA (r) = 
P n [te A(Z))1(Z < r)], G nA (r) = F n [4>(X,A(Z))l(Z < r)] and B nA (r) = 
J^ oc A(z) dG nA {z) — W nA (r). Also, define normalized processes B nA (h) and 
G nA {h) in the following manner: 

BnAh) = n 2 ' 3 [(B nA (z + hn' 1 ' 3 ) - B nA (z )) 

- 4>(z )(G nA (z + Jin" 1 ' 3 ) - G nA (z ))] 
x (I^(z ))pz(z ))- 1 

and 

G nA {h) = n 1 ' 3 1 A G nA {z Q + hrC 1 '*) - G nA (z )). 



-4>{zo)\, 

-#*o)l} 
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Lemma 2.2. The process B n ^(h) — >d X a ^{h) in the space B\ oc (M.), where 
a and b are as defined in Theorem 2.1. 

Lemma 2.3. For every K > 0, the following asymptotic equivalences 
hold: 

sup \B n ^{h) - B r (h)\ -+ p 
he[-K,K] ' Vn 

and 

sup \B n ^(h) - B n t (h)\ -> p 0. 
he[-K,K] ' Yn 

Lemma 2.4. The processes G n ^ (h) and G n ^ {h) both converge uni- 
formly (in probability) to the deterministic function h on the compact inter- 
val [—K,K], for every K > 0. 

The next lemma characterizes the set D n on which ip n and ^ vary. 

Lemma 2.5. Let D n denote the interval around zq on which i[) n and ip^ 
differ. Given any e > 0, we can find an M > 0, such that for all sufficiently 
large n, 

P(D n C [zq - Mn~ 1/3 , z + Mn~ 1/3 }) >l-s. 

Lemma 2.6 ([21]). Suppose that {W n£ },{W n } and {W £ } are three sets 
of random vectors such that: 

(i) lim e ^o limsup^^ P[W n£ / W n ] = 0, 

(ii) lim e ^o P[W e ^ W] = and 

(hi) for every e > , W ne — >d W £ as n — > oo. 

Then W n — >d W , as oo. 

Proof of Theorem 2.1. The proof presented here relies on continuous- 
mapping arguments for slopes of greatest convex minorant estimators. From 
the self-induced characterization of ip n [see (2.5)], we have 

Mz) - VKzo) = slogcm((i^ n - ^z )G n ^J o G^J(G n ^(z)). 

Let h = n 1 / 3 (z — zq) be the local variable and recall the normalized processes 
that were defined before the statement of Lemma 2.2. In terms of the local 
variable and the normalized processes, it is not difficult to see that 

n 1 /3(^( Z0 + fcn-i/3) _ V , (zo)) = S l gcm(£ n ^ o G-\)(G n ^ (h)). 
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Similarly, from the characterization of [refer to (2.7) and (2.8)] and the 
definitions of the normalized processes it follows that 

n i/3 ( ^o (zo + hn -i/ 3) _ ij{zQ)) = s i ogcm o { B nM o G-^ ){G nM {h)). 
Thus, 

(X n (h\ Y n (h)) = {slogcm(5 n ^ o G-^){G n> ^ (h)), 

By Lemma 2.3, the processes B^JJi) - B n ^{h) and B n ^ n {h) - B n ^{h) 
converge in probability to uniformly on every compact set. Furthermore, 
by Lemma 2.2, the process B n ^(h) converges to the process X a ^{h) in 
-E>i oc (M). It follows that the processes 

(B n ^(h),B n ^(h)) ^ d (X atb (h),X aib (h)), 

in the space i?i oc (M) x i?i oc (M) equipped with the product topology. Further- 
more, by Lemma 2.4, the processes 

(G n ,^h),G nM (h))^ p (h,h). 

The proof is now completed by invoking continuous mapping arguments for 
slopes of greatest convex minorant estimators: thus, the limit distributions 
of X n and Y n are obtained by replacing the processes on the right-hand 
side of (2.9) by their limits. The details of the arguments are available in 
Theorem 2.1 of [2]. It follows that for any (h\,h2, ■ ■ ■ , h^), 

{X n (hi),Y n (hi)}i =l ^ d {slogcmX^^/i^^logcm ^,^^)}^! 

= {9a,b( h i),9l,b(hi)}i=l- 

The above finite-dimensional convergence, coupled with the monotonicity of 
the functions involved, allows us to conclude that (X n (h), Y n (h)) — >d {g a ,b(h), 
g® b {h)) in C x C as well. Indeed, if a sequence {tp n ,(j) n } of monotone functions 
converges pointwise to the monotone functions {ip,<fr}, then (ip n ,(p n ) also 
converges to (tp, (ft) in C x C (see the result of Corollary 3 following Theorem 
3 of [14]). □ 



Proof of Theorem 2.2. We have 



2 log A n 



]T <p(x (l) ,Mz {l) )) - £ 0(x (i)J v£(z w )) 



2<S' ri , 
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say. Here J n is the set of indices for which ij) n {Z(^) and ^(Z^) are different. 
By Taylor expansion about ip(zo), we find that S n equals 

+ 2^ o (^(%)) - Vw) 



£0(X w ,V^o))(^(%)-V^o)) 



+ 2^ o (C(%)- WJ) 



+ Rn, 



with i? n = Rn A - R n , 2 , where i^i = (l/6)EieJ„ ^''(^W^iX^raO^i)) ~ 
V(z )) 3 and i2 n ,2 = (l/6)E ie j n 0" , (^(O>Ci)(^(^(O) " ^o)f for points 
•0* j [lying between ip n (Z^) and ^(-zo)] and 0**j [lying between lip^(Z^) and 
0(zo)]- Under our assumptions R n is o p (l), as will be established later. Thus, 
we can write S n = I n + II n + o p (l), where I n = I n j_ — I n ,2i with 



'n,l 



In,2 = H X (i)i^( z o))('<Pn(Z( i) ) - ^Oo)) 



(2.10) 



and 



Y,kx^M^m Q n (Zii))-^)) 



- 2^ 2 (C(%)- w)) • 

Consider the term J n> 2- Now, J n can be written as the union of blocks of 
indices, say B®, B®, ■ ■ ■ , -B{\ such that the constrained solution ip^ is constant 
on each of these blocks. Let B denote a typical block and let w B denote the 
constant value of the constrained MLE on this block; thus ^{Z^) = w B for 
each j G B. For any block B where w B ^ 9q we can write J2j<=B ^G^O')' VK^o)) x 
(w B — ip(zo)) as (w B — ip(zo))H, where 

H = J2[kX(j),w%) + ^(z )-w B )4>(X U) ,w B ) 
jeB 

+^z )-w B ) 2 r(x {j) , w °/)}, 
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for some point w B '* between w B and ip(zo). Using the fact that on each block 
B where w B / 60, we have Sjes^ ) (^(j)' u, B) = [from (2.6)], it follows that 
(w B — Tp(zo))H equals 

-J2Wz )-w B ) 2 4>(X U) ,w° B ) 
jeB 

We conclude that l n p. equals 

- Hx (t) ,^z {l) ))(^ n (z {l) )-^(z )) 2 

+ i E r(x {i) ,^(z ( ^)(^ n (z {i) ) - ^(z )) 3 , 

where ^*{Z^) is a point between tp^Zu-\) and '(/'(^o)- The second term in 
the above display is shown to be o p (l) by the exact same reasoning as used 

for R n ^i or R n>2 . Hence, I n , 2 = -EieJ n K X (i)^n( Z (i)))$n{ Z {i)) ~^( z o)) 2 + 
o p (l), which by a one-step Taylor expansion about tP(zq) can be seen to be 

equal to -EieJ„ 4>( x (j.)^( z o))(' l Pn( z (i)) ~ V^o)) 2 up to a o p (l) term. Sim- 
ilarly J nj i = -EiejJ(%i(2o))(i(%) - V^o)) 2 + o p (l). Now, using 
the fact that S n = I nj i — I n ,2 + H n + o p (l) and using the representations for 
these terms derived above, we find that up to a o p (l) term, S n equals 

- i{ E 4>(x^M z o))(M z (r)) - V^o)) 2 

lie Jn 

- £ <A(X w ,^(zo))(^(%))-^(^o)) 2 }, 
whence 2 log A n is given by 

E ^( x W'V ; (^o))(V'n(%) - V'C^O)) 2 

- £ HX {l) ^(zo))(iP n (Z {i) )-il>(z )) 2 + o p (l). 

Letting £ n (x,z) denote the random function 

'<P{x,^{zo)){{n l /\Mz) - ^{zo))f - (n 1/3 (to - ^(zo))?}l(z G D n ), 
it is easily seen that 

2 log A n = n l '\¥ n - P)U(x, z) + n^PUix, z) + o p (l). 



16 



M. BANERJEE 



The term n 1//3 (P n — P)£ n (x,z) — > p by Lemma 2.7 below. It now remains 
to deal with the term n 1 / 3 P(^ n (x, z)) and as we will see, it is this term 
that contributes to the likelihood ratio statistic in the limit. We can write 
n l ^PS, n {x, z) as 

n 1 / 3 I E^ z) ^{X^{z ))){{n 1 l\Mz)-^))f 

J D n 

-{n 1 /\^ n {z)-^{z Q ))f} Pz {z)dz. 

On changing to the local variable h = n 1 / 3 (z — zq) and denoting zq + /in -1 / 3 
by z n (h), the above can be decomposed as A n + B n , where 

A n = L [E Hzo) ^X,^z ))](X^h)-Y^(h)) Pz (z n (h))dh 

J Dn 

and 

B n = I [E^ Mh)) 4>(X,i/)(z )) -E^ zo) 4>(X,tp(z ))] 

x(X 2 n (h)-Y 2 {h)) Pz (z n (h))dh, 

where D n = n 1 / 3 (D n — zq). The term B n converges to in probability on 
using the facts that eventually, with arbitrarily high probability, D n is con- 
tained in an interval of the form [— M, M] on which the processes X n and Y n 
are O p (l) and that for every M > 0, sup| fe |< M | E^ Zo+hn -i /3) (^(X,t(;(zo))) - 

Ej, [zo )(<f>(XMzo))) HO by (A.6). Thus, 

21ogA n = /(V(z )) I. (X^-Y^h^pzizo + hn-y^dh + Opil) 

= ~ 2 i (X 2 (h)-Y 2 (h))dh + o p (l). 
or Jd„ 

We now deduce the asymptotic distribution of the expression on the right- 
hand side of the above display, using Lemma 2.6. Set W n = aT 2 f D (X%(h) - 
Y 2 (h))dh and W = a~ 2 J{(g a , b (h)) 2 - (g° a b {h)) 2 } dh. Using Lemma 2.5, for 
each e > 0, we can find a compact set M £ of the form [— K £ ,K £ ] such that 
eventually, P[D n C [-K e ,K e ]} > 1-e and P[D a>b C [-K £ ,K £ ]} > 1-e. Here 
D a £ is the set on which the processes g ajb and g® b vary. Now let W ne = 
a- 2 ! { _ K ^ Ke] {X 2 n {h) -Y 2 (h))dh and W £ = !^ Ks>Ke] (l/a 2 ){(g a , b {h)f - 
(g® b (h)) 2 )dh. Since [— K £ ,K £ ] contains D n with probability greater than 

1 — e eventually (D n is the left closed, right open interval over which the 
processes X n and Y n differ), we have P[W n£ ^ W n ] < e eventually. Simi- 
larly P[W £ ^ W] < £. Also W n£ — >d W £ as n — > oo , for every fixed e. This is 
so because by Theorem 2.1 (X n (h), Y n [h)) — >^ {g a ,b{h),ga b {h)) as a process 
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in £ x £ and (/i,/^) l— ► Jj_ c c ](/i (^) ~~ * s a continuous real-valued 

function denned from £ x £ to the reals. Thus all conditions of Lemma 2.6 
are satisfied, leading to the conclusion that W n — >d W . The fact that the lim- 
iting distribution is actually independent of the constants a and b, thereby 
showing universality, falls out from Brownian scaling. Using (2.2) we obtain 

W = ± f{(9a, b (h)) 2 -(9°a, b (h)) 2 }dh 

^ d ^a 2 (6/a) 2/3 j{(g{{b/a)^h)f - (g°((b/af/ 3 h)f}dh 

= J{(g(w)) 2 -(g°n) 2 }dh, 

on making the change of variable w = (b/a) 2 ^ 3 h. It only remains to show 
that R n is o p (l) as stated earlier. We outline the proof for R nt i; the proof 
for R nt 2 is similar. We can write 

R nA = ^nW'\X^Z)){n 1/ HMZ)-i;(z ))}h(ZeD n )}, 

where i^n{Z) is some point between ijj n (Z) and tp(zo). On using the facts that 
D n is eventually contained in a set of the form [zq — Mn _1//3 , zo + Mn -1 / 3 ] 
with arbitrarily high probability on which {n 1 ^ 3 (ip n (Z) — i/j(zo))} 3 is O p (l) 
and (A. 5), we conclude that eventually, with arbitrarily high probability, 

l-Rn.il ^ C(F n - P)[B(X)l{Z G [zo - Mn- 1/3 , Zo + Mn" 1 / 3 ])] 

+ CP[B(X)1(Z G [zq - Mn" 1 / 3 , z + Mn" 1 / 3 ])], 

for some constant C. That the first term on the right-hand side goes to in 
probability is a consequence of an extended Glivenko-Cantelli theorem (see, 
e.g., Proposition 2 or Theorem 3 of [25]), whereas the second term goes to 
by direct computation. □ 

Lemma 2.7. With £ n (x,z) as defined in the proof of Theorem 2.2, we 
have n 1 /3(p n _p)^( x , z ) . 

The proof of this lemma uses standard arguments from empirical process 
theory and can be found in [2]. 

3. Applications of the main results. In this section, we discuss some 
interesting special cases of monotone response models. 

Consider the case of a one-parameter full rank exponential family model, 
naturally parametrized. Thus, p(x, 9) = exp[9T(x) — C(9)]h(x) where 9 varies 
in an open interval 0. The function C possesses derivatives of all orders. Sup- 
pose we have Z ~ pz(') and X\Z = z ~ p(x, ip{z)) where ijj is increasing or de- 
creasing in z. We are interested in making inference on V>( z o)> where zq is an 
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interior point in the support of Z . If pz and ip satisfy conditions (a) and (b) 
of Section 2, the likelihood ratio statistic for testing ip(zo) = 9q converges to 
B under the null hypothesis, since conditions (A.0)-(A.7) are readily satisfied 
for exponential family models. Note that l(x,9) = 9T(x) — C{9) +\ogh{x), 
so that i(x,0) = T(x) - C'{9) and l(x,9) = -C"{6). Conditions (A.1)-(A.6) 
can be checked quite readily. We leave the details to the reader. To check 
condition (A. 7), note that since C'(9) and l(x,9) are uniformly bounded 
for 9 6 (#o — £, 9o + e) = Af, by choosing M sufficiently large, we can ensure 
that for some constant 7 and 9 £ Af, H(9, M) < E e [(2T(X) 2 + T{X) \> 
M/2}], which in turn is dominated by 



sup e~ c{e) 



x l(|T(x)| >M/2)h(x)dfi(x) 



The expression above is not dependent on 9 and hence serves as a bound 
for supg^H (9, M). As M goes to 00 the above expression goes to 0; this is 
seen by an appeal to the DCT and the fact that T 2 (X) + 7 is integrable at 
parameter values 9q — e and 9q + e. 

The nice structure of exponential family models actually leads to a sim- 
pler characterization of the MLEs ip n and For each block B of indices 
on which ip n (Z^) is constant with common value equal to, say w, it fol- 
lows from the discussion of the Fenchel conditions that characterize ip n in 

Section 2 that T,ieB( T ( x (i)) ~ c '( w )) = °5 hence c \ w ) = n B l T,ieB T ( x (i)) 
where tib is the number of indices in the block B. Furthermore, from the 
Fenchel conditions, it follows that if S is a head-subset of the block B, then 
Y.i &B (T{X (i) ) - C'{w)) > 0; that is, C'(w) < n^ 1 J2 t esT(X {i) ), n s denoting 
the cardinality of 5. As a direct consequence of the above, we deduce that 
the unconstrained MLE ip n can actually be written as {C'(tjj n (Z^))}f = i = 
slogcm{G n (Z {l) ),V n (Z (l) )}? =0 ,whereG n (z)=n- 1 j:? =1 l(Z l < z) and V n (z) -- 

n ~ l Er=i T{X l )l{Z i < z) and G n (Z {0) ) = V n (Z {0) ) = 0. The MLE is char- 
acterized in a similar fashion but as constrained slopes of the cumulative 
sum diagram formed by the points {G n (Z^),V n (Z^)}f_ . Thus, the MLEs 
have explicit characterizations for these models and their asymptotic distri- 
butions may also be obtained by direct methods. It is not difficult to check 
that examples (a), (b) and (c) discussed in the Introduction are special cases 
of the one-parameter full rank exponential family models discussed above. 
Theorems 2.1 and 2.2 therefore hold for these models, and MLEs have ex- 
plicit characterizations and are easily computable. In particular, the result 
on the likelihood ratio statistic derived in [3] follows as a special case of our 
current results. 
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We now turn to example (d); here, the conditional distribution of the 
response given the covariate comes from a curved exponential family. We 
study this example in some detail. In this case, with c = d = 1, the Fenchel 
conditions for ip n take the following form (in empirical process notation): 

ml -J—d¥ n {x,z)+f xdF n (x,z) + (m-l) I — - 

J[a,t) lf) n (z) Jze[a,t) Jze[a,t) -0n(-z) 2m_1 

-ml x 2 tp n (z) 2m ~ 1 dF n (x,z)>0, te[a,b], 

Jze[a,t) 

with equality if t is a jump point for ip n . For a general (real) m, the above 
equations and inequalities do not translate into an explicit characterization 
of the MLE as the solution to an isotonic regression problem. The mileage 
that we get in the full rank exponential family models is no longer available 
to us owing to the more complex analytical structure of the current model. 
The self-induced characterization nevertheless allows us to write down the 
MLE as a slope of greatest convex minorant estimator, and determine its 
asymptotic behavior, following the route described in this paper (we leave 
the verification of the regularity conditions on the parametric model to the 
reader). To simplify matters, take m = 1. In this case, the joint distribution 
of (X,Z) is given by g(x,z) = p(x\tp(z))h(z) where h is the Lebesgue den- 
sity of Z, p(x\y) = yf{xy — 1) and / is the standard normal density. For 
our simulation study, we chose [a, b] = [1,2] and ip(z) = z and Z to follow 
the uniform distribution on [1,2]. Then, g(x,z) = p(x\z) = zf(xz — 1), for 
i£R,z£ [1)2]. Essentially, we are in the setting of a mixture model. Fol- 
lowing the discussion of the self-induced characterization for tp n described 
in Section 2, we find that ip n , for this example, is the slope of the convex 
minorant of the "self-induced" cumulative sum diagram, 

{¥ n [(x 2 + Mz)- 2 )l(z e [a,t))],P n [(2^" 1 (z) + x)l(z E [a,t))\ :t € [a,b]}. 

The left panel of Figure 1 shows the true function ip(z) = z and the MLE 
ip n {z) for the specific sample of size n = 2000 generated from the above 
model. It can be seen that the estimator tracks the true function quite well 
apart from the endpoints where the "spiking problem" manifests itself. The 
right panel of Figure 2 shows the unconstrained MLE (solid line) and the 
constrained MLE (dashed line) computed under the (true) Hq : ^(1.5) = 1.5, 
in a neighborhood of the point 1.5, along with the true function i/j(z) (the 
slant line). The estimators are seen to coincide outside of a small interval 
around 1.5. It is the difference in behavior of the estimators in this short 
interval that contributes to the likelihood ratio statistic. The unconstrained 
estimator was obtained by running the ICM (iterative convex minorant) 
algorithm with starting value V'o se t to be the constant function 1.5. It con- 
verged quite rapidly without resorting to the line search procedure (see [15] 
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Fig. 1. Left panel: The unconstrained estimator. Right panel: Close-up view of uncon- 
strained and constrained estimators. 



for an excellent description of the modified ICM algorithm that incorporates 
line search to guarantee convergence). The constrained MLE was computed 
by decomposing the likelihood maximization procedure into two parts and 
optimizing separately, as described in the characterization of ip® in Section 
2. The left panel of Figure 2 shows the quantile-quantile plot of 3000 points 
from the distribution of the likelihood ratio statistic for n = 1000 versus 3000 
points from (a fine discrete approximation to) the distribution of D, along 
with the line y = x. The quantile-quantile plot is in very good agreement 
with the line y = x, in conformity with the theory presented in this paper. 

An interesting fact that we now discuss is the rapid convergence of the 
ICM algorithm for this problem, in terms of number of steps to convergence. 
This is illustrated in the right panel of Figure 2. The histogram on the left 
is that of the number of iterations that is needed by the ICM to converge to 
i\) n (with a tolerance of 10 -5 for checking the Fenchel conditions) based on 
1000 replicates for n = 100. The histogram on the right presents the same 
information but for n = 10,000. Despite the vast disparity in sample sizes, 
the histograms are very similar; less than 1% of the iterations consume more 
than ten steps (of course, the actual duration of an iteration is larger for 
n = 10,000). The fast convergence demonstrated through these histograms 
indicates that the performance of the ICM algorithm resembles that of the 
Newton algorithm, which is known to have good local convergence proper- 
ties. This can be explained by the fact that the Hessian matrix of (ft (minus 
the logdikelihood) is diagonal for this model, and indeed for the entire class 
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Fig. 2. Left panel: Quantile-quantile plot of likelihood ratio statistic versus limiting quan- 
tiles. Right panel: Histograms of number of iterations until convergence. 

of models considered in this paper, since the tij's, the arguments to 4>, are 
separated in the optimization problem (Indeed, the separation of variables 
that we encounter in the log-likelihood also allows us to solve the constrained 
optimization under Hq by splitting the likelihood into two different parts and 
optimizing separately). A similar phenomenon was observed by Jongbloed 
[15] in his simulation studies on Case 2 interval censoring, where, despite 
the nondiagonal nature of the Hessian of the log-likelihood function (this is 
a nonseparated problem and will be discussed shortly), the ICM algorithm 
converged quickly, as a consequence of the fact that there were very few 
off-diagonal elements in the Hessian. 

The Newton behavior of the ICM algorithm for the separated models 
of this paper suggests that in these models a one-step algorithm starting 
with the true function will produce estimators which are asymptotically 
equivalent to the MLE, even if the MLE is restricted by a null hypothesis. 
This phenomenon is alluded to as the "working hypothesis" in Section 5 of 
[9] and is illustrated through a derivation of the limit distribution of the 
MLE of the survival distribution F for the current status model; in fact, the 
diagonal structure of the Hessian is used to establish the equivalence of the 
MLE with the "toy estimator" obtained by using the first iteration step of 
the ICM with the true distribution as the starting point. Thus, our results 
can be interpreted as pointing very strongly to the fact that the "working 
hypothesis" holds for the class of models considered in this paper. 
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While the approach in this paper applies nicely to separated models, there 
are several monotone function models of considerable interest where separat- 
edness of the arguments to the log-likelihood function cannot be achieved; 
consequently the Hessian is no longer diagonal. Perhaps the simplest model 
of this type is the Case 2 interval censoring model, where there are two ob- 
servation times (Ui,Vi) for each individual, and one records in which of the 
three mutually disjoint intervals (0, Ui\, (Ui,Vj], (Vi,oo) the individual fails. 
Letting {T^}^ =1 denote the distinct ordered values of the 2n observation 
times {(Ui,Vi) :i = 1,2, ... ,n} (here n is the number of individuals being 
observed), and u\ denote F(Tn^), one can write down the log-likelihood for 
the data. It is seen that terms of the form log(uj — Uj) immediately enter 
into the log- likelihood (see, e.g., [9], for a detailed treatment). One impor- 
tant consequence of this, in particular, is the fact that the computation of 
the constrained MLE of the survival distribution F [under a hypothesis of 
the form F(to) = 8q] can no longer be decomposed into two separate opti- 
mization problems, in contrast to the monotone response models we have 
studied. Consequently, an analytical treatment of the constrained estima- 
tor will involve techniques beyond those presented in this paper. Regarding 
the unconstrained estimator of F in this model, Groeneboom [8] uses some 
hard analysis to show that under a hypothesis of separation between U and 
V (the first and second observation times), the estimator converges to the 
truth at (pointwise) rate n 1//3 , with limit distribution still given by Z. The 
Case 2 model readily generalizes to the mixed case censoring model, where 
instead of two random observation times for every individual, the number 
of random times at which an individual is examined is also random. While 
heuristic considerations indicate that D should also arise as the limit dis- 
tribution of the likelihood ratio statistic in these problems, the technical 
machinery for treating such nonseparated models in full generality remains 
to be developed and is left as a topic for future research. 

4. Discussion. In this paper, we have studied the asymptotics of like- 
lihood based inference in monotone response models. A crucial aspect of 
these models is the fact that conditional on the covariate Z, the response 
X is generated from a parametric family that is regular in the usual sense; 
consequently, the conditional score functions, their derivatives and the con- 
ditional information play a key role in describing the asymptotic behavior 
of the maximum likelihood estimates of the function tjj. We have also shown 
that there are several monotone function models of interest that may be 
expected to exhibit asymptotically similar behavior though they are not 
monotone response models in the sense of this paper. 

A potential extension of the monotone response models of this paper 
is to semiparametric models where the infinite-dimensional component is 
a monotone function. Here is a general formulation: Consider a random 
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vector (X, W, Z) where Z is unidimensional, but W can be vector- valued. 
Suppose that the distribution of X conditional on (W, Z) = (w, z) is given by 
p(x, (5 T w + ip(z)) where p(x,6) is a one-dimensional parametric model. We 
are interested in making inference on both (3 and ip. The above formulation 
is fairly general and includes, for example, the partially linear regression 
model X = [3 T W + tp(Z) + e where ip is a monotone function (certain aspects 
of this model have been studied by Huang [12]), semiparametric logistic 
regression [with X denoting the binary response and (W, Z) covariates of 
interest] where the log odds of a positive outcome (X = 1) is modeled as 
P T W + ip(Z), and other models of interest. It is not difficult to see that the 
self-induced characterization will again come into play for describing the 
MLEs of ip in this general semiparametric setting. In the light of previous 
results, we expect that under appropriate conditions on p(x,6), \/ti((3mle — 
[3) will converge to a normal distribution with asymptotic dispersion given 
by the inverse of the efficient information matrix and the likelihood ratio 
statistic for testing (3 = (3q will be asymptotically x 2 - The theory developed 
in [19, 20] should prove very useful in this regard. As far as estimation of 
the nonparametric component goes, ip n , the MLE of tp, should exhibit n 1 / 3 
rate of convergence to a nonnormal limit and the likelihood ratio for testing 
tp pointwise should still converge to D. This will be explored elsewhere and 
the ideas of the current paper should prove to be useful in dealing with the 
nonparametric component of the model. 

APPENDIX 

Here, we present proofs of some selected lemmas. For proofs of the re- 
maining lemmas, see [2]. 

Proof of Lemma 2.2. It suffices to show that B n ^(h) converges to 
the process aW(h) + bh 2 in l°°[—K, K], the space of uniformly bounded 
functions on [—K,K] equipped with the topology of uniform convergence, 
for every K > 0. We can write 

B n ^ih) = ^(P n - P)f n>h + V^Pfn,h, 

where f n ^ h {X,Z) is given by 

n i/6 [Wz) _ ^ ZQ ))^ X ^{Z)) - j>(X,il>(Z))](l(Z < z n (h)) - 1(Z < zp)) 

I(ip(z ))p z {z ) 

with z n {h) = zq + /in" 1 / 3 . To establish the above convergence, we invoke 
Theorem 2.11.22 of [24]. This requires verification of Conditions 2.11.21 and 
the convergence of the entropy integral in the statement of the theorem. Pro- 
vided these conditions are satisfied, the sequence ^/n(F n — P)f n ,h is asymp- 
totically tight in l°°[—K,K] and converges in distribution to a Gaussian 
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Hm Pf n ,sfn,t = 77777 \ r~Tv? G (^( Z o))Pz( z o){s A t) 



process, the covariance kernel of which is given by 

X(a,t) = lim (Pf n . s f n .t - Pfn,sPfn.t). 

We first compute Pf n ,sfn,t- It is easy to see that this is if s and t are of 
opposite signs, so we need only consider the cases where they both have the 
same sign. So let s,t > 0. Then, Pf n ,sfn,t is given by 

E[n^ 3 (^(X, ^(Z)) - {i>{Z) - iP(z ))4>(X, ^(Z))) 2 

x l(Z G (zo, z + (sA ^n-V 3 })] x (I(^(z ))p z (z ))- 2 , 

which can be written as (I(ip(zo))pz(zo))~ 2 n 1 / 3 J^° + ^ sAt ^ n ^ G(^(z))pz(z) dz 

where, for every 0, G(0) = E g [<j>{X, 9) - (9 - 9 )4>(X, 6)] 2 . On expanding the 
square, G(6) simplifies to 

1(9) + {9- 9 ) 2 f 3 (9, 9) - 2(9 - 9 )E e (<j ) (X, 9)4>(X, 9)). 

As 9 ^ 9q = ijj(zo) , the first term converges to I (60) by (A. 4) and the second 
term converges to by (A. 6). The third term also converges to 0, by the 
Cauchy-Schwarz inequality. It follows that G(9) converges to I(0q) = G(9q). 
By the continuity of ip at zq, we conclude that 

1 

(I(^(z Jp z (z )) 

- 1 , A Q 

I(ip(z ))pz(zo) 

It is easily shown that Pf ns an d Pf n ,t both converge to as n — > 00, showing 
that for s,t>0, K(s,t) = [I(^(z ))pz(zo)}~ 1 (s A t). Similarly, we can show 
that K(s,t) = [I(ip(zo))pz(zo)]~ 1 (\s\ A \t\), for s,t < 0. But this is the covari- 
ance kernel of the Gaussian process aW(h) with a = [I(iP(zq))pz(zq)]~ 1 ^ 2 . So 
the process \/n(F n — P)f n> h converges in l°°[—K,K] to the process aW(h). 
We next show that ^/raP/ n) / l — ► (ip' (zq) /2)h 2 uniformly on every [—K,K]. 
This implies that the process B n ^(h) = y/nF n f nt h converges in distribu- 
tion to X a b(h) = aW(h) + bh 2 in l°°[—K,K]. To show the convergence of 
\/nPfnh to the desired limit, we restrict ourselves to the case where h > 0; 
the case h < can be handled similarly. Let £ n (h) = P\4'( z o))pz(zo)\/nPf nj h- 
Then £ n (h) is given by 

n 2 ' 3 E{[(i,(Z) - ^(zo))^(X^(Z)) - <j>(X^(Z))]l(z < Z < z + hn" 1 ' 3 )}, 

which reduces to n 2 / 3 E[(i)(Z) - ^(z o ))0(X, ^(Z))l(z <Z<z + tin' 1 / 3 )], 
on using the fact that E^^(j>(X,^(z)) = 0. Writing z n (u) for zq + un~ 1 / 3 we 

can express this quantity as A + B where A = Jq mp' (zo)I(tp(z n (u)))pz(z n (u)) du 
and 

B = / V /3 (V>(^(«)) - 4>(z Q )) - ^(z )u]I(^(z n (u)))p z (z n (u)) du. 
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The term B converges to uniformly for < h < K by the differentiability of 
t/i at zq and A can be written as Jq utp'(zo)I(ip(zo))pz(zo) du + o(l), where 
o(l) goes to uniformly over /i £ [0, K] and is readily seen to converge 
to (l/2)(7p'(z )I(Tp(z ))pz(z ))h 2 uniformly on < h < K. It follows that 
y/nPfn,h —> (^'( z o)/^)h 2 uniformly over < h < K. 

It remains to check Conditions 2.11.21. The computations here are tedious 
but straightforward, so we have omitted them (see [2] for the full details). 
Assumption (A. 7), in particular, is used to verify a Lindeberg-type condition. 
□ 

Proof of Lemma 2.3. We only prove the first assertion. The second 
one follows similarly. For the first assertion, we write the proof for h > 0; the 
proof for h < is similar. So, let < h < K. Recall that B n ^ (h) is given 
by 

Cn 2 / 3 F n [{(i, n (Z) - ^(z ))4>(X,MZ)) - j>(X,$ n (Z))} 

x 1(Z€ (z ,z + hn^ 3 ])], 

where C is a constant, and B n ^(h) has the same form as above but with ip n 
replaced by Now, for any Z E (zq,zq + Kn~ l / 3 \ we can write (j)(X, ip(zo)) 
as 

4>{X^{Z)) +$(X,il>(Z))U,(zo) - r(>(Z)) 

+ \4>"\x,r(z)mz)-^))\ 

for some point ip*(Z) between ip(Z) and ip{zo). We can also write (j)(X, ip(zo)) 
as 

<P(X,ipn(Z)) + 4>(X, ip n (Z))(ip(zo) - 4>n{Z)) 

+ ±r\x,r n (z))(MZ)-ip(z )) 2 , 

for some point ip^{Z) between ip n (Z) and tp(zo). It follows that we can write 
B n ^(h)-B ndin (h) as 

C\W n [{n l f 3 {i>{Z) - i;(z )) fr'(X, j>*(Z))l(Z E (z , z + /m" 1 ^])] 

- C\¥ n {{n l / 3 {MZ) - V^o))) V'(A,V£(Z))1(Z G (*o,*o + hn^ 3 })}. 

We will show that the second term in the above display converges to 
uniformly in h; the proof for the first term is similar. Up to a constant, the 
second term is bounded in absolute value by 
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Denote the random function inside square brackets by £ n . For any z G 
(zq, zo + Kn" 1 / 3 ], we have 

[n^iMz) - i>{z Q ))f < (n^iMzo + Kn' 1 ' 3 ) - ^{z ))f 

+ (n 1 / 3 (i, n (z -Kn~ 1 / 3 )-i ) (zo))) 2 , 

which with arbitrarily high probability is eventually bounded by a constant 
C (by Lemma 2.1). Also, since for any such z ipn{z) converges in probabil- 
ity to ip(zo), with arbitrarily high probability \<f)'"(X, ^*(Z))| is eventually 
bounded by B(X) [by assumption (A. 5)]. It follows that with arbitrarily 
high probability the random function £ n is eventually bounded up to a con- 
stant by B(X)1(Z 6 [^Oj-^o + Kn -1 / 3 ]). Hence, eventually, with arbitrarily 
high probability, 

Pn(£n) < C(F n - P)[B(X)1(Z e [zq, z + Kn-V 3 ])] 

+ CP[B(X)l(Ze[z ,z + Kn~ 1 / 3 })}, 

for some constant C. The first term on the right-hand side is o p (l) using 
straightforward Glivenko-Cantelli type arguments and the second term is 
seen to go to by direct computation. This shows that the second term goes 
to uniformly in h. □ 
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